Skip to content

Fix/desc construction#342

Merged
GalAshkenazi1 merged 11 commits into
developfrom
fix/desc_construction
May 20, 2026
Merged

Fix/desc construction#342
GalAshkenazi1 merged 11 commits into
developfrom
fix/desc_construction

Conversation

@GalAshkenazi1
Copy link
Copy Markdown
Contributor

Fix records= path producing 'None' descriptions + unified description construction


Main Bug Fixed

When constructing EEGDashDataset via records=, every recording's description was
the string "None". The records= path never built or forwarded a description dict to
EEGDashRaw; braindecode then stringified the missing None default.


Changes

eegdash/dataset/dataset.py

  • Bug fix: records= path now builds and passes a description dict to each
    EEGDashRaw, matching the behaviour of the query and offline paths.

  • Unified _build_description helper: All three construction paths (records=,
    offline, query) previously built descriptions independently. A shared _build_description
    method now covers all three, eliminating divergence. It:

    • Pre-fills every requested field with None (so desc["subject"] never raises KeyError)
    • Looks up fields via _find_key_in_nested_dict (recursive, case/separator-insensitive,
      handles both v1 and v2 record formats)
    • Merges participant_tsv data with configurable precedence (see below)
    • Emits a DEBUG log when a conflict is detected
  • Configurable precedence (description_precedence): New constructor parameter controls
    which source wins when the same field appears in both the record and participant_tsv:

    • "record" (default) — existing behaviour, record-level value is kept
    • "participant_tsv" — participant_tsv overwrites the record value, including None
      (documented intentional behaviour: choosing this mode means fully trusting that source)
  • All-None field warning: After construction, emits a WARNING if any
    description_fields entry is None across all recordings — surfaces typos like
    "sbject" early.

  • _normalize_records called on the full batch: Preserves deduplication correctness
    (per-record calls broke _dedupe_records).

tests/unit_tests/dataset/test_build_description.py (new file)

Test What it checks
test_build_description_precedence_conflict Default "record" mode: top-level value wins, "kept" logged
test_build_description_missing_fields_padding Absent fields → None, no KeyError
test_build_description_key_insensitivity "Subject-ID" in record maps to "subject_id" in fields
test_dataset_initialization_path_parity All three init paths produce identical description DataFrames
test_build_description_participant_tsv_precedence "participant_tsv" mode: tsv value wins; None in tsv overwrites real value
test_dataset_invalid_description_precedence Unknown description_precedence raises ValueError at construction

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

📚 Documentation Preview

📦 Download Documentation Artifact

Download the documentation-html artifact from the workflow run to view the docs locally.

💡 To enable live previews, add a SURGE_TOKEN secret to this repository. See surge.sh for setup instructions.

**base_dataset_kwargs,
)
)
for record in self.records
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for record in self.records
datasets = [
EEGDashRaw(
record,
self.cache_dir,
description=self._build_description(record, description_fields),
**base_dataset_kwargs,
)
for record in self.records
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimizing the diff.

Comment thread eegdash/dataset/dataset.py Outdated
record: dict[str, Any],
description_fields: list[str],
participants_row: dict[str, Any] | None = None,
) -> dict[str, Any]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more verbose than necessary for this function, and I would move to an auxiliary place, some utilities. This way, the dataset object does not deliver this function, which is only used once.

Comment thread eegdash/dataset/dataset.py Outdated
Comment on lines +584 to +599
def _build_description(
self,
record: dict[str, Any],
description_fields: list[str],
participants_row: dict[str, Any] | None = None,
) -> dict[str, Any]:
"""Build a description dict for a single record.

Extracts values for each requested field from the record, then merges
participant data from either an explicit ``participants_row`` (offline
path, from a local ``participants.tsv``) or the embedded
``participant_tsv`` key inside the record (online paths). Fields still
absent after the merge are set to ``None`` so the schema is always
complete. When both the record and participant data carry the same
field, precedence is determined by ``self._description_precedence``; a
``debug``-level log is emitted when the values differ.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can compact much more function too.

Comment thread eegdash/dataset/dataset.py Outdated
participants_row=part,
description_fields=description_fields,
)
description = self._build_description(record, description_fields)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice simplification

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test suite, we need to do much more compact, use parametrization, and use a more pytest-style.

@GalAshkenazi1
Copy link
Copy Markdown
Contributor Author

Fixed Brunos notes

eegdash/bids_metadata.py

  • find_key_in_nested_dict — made standalone.
  • build_description — compacted from ~80 lines to ~25, now takes description_precedence explicitly.

eegdash/dataset/dataset.py

  • Imports: dropped merge_participants_fields and normalize_key, added build_description
  • records= path: now in list comprehension
  • Offline path: build_description(record, ..., self._description_precedence, part_row) — one line
  • Query path: same one-liner
  • Moved _build_description method (~80 lines)
  • Moved _find_key_in_nested_dict method (~30 lines)

tests/unit_tests/dataset/test_build_description.py

  • Test 2: rewritten with @pytest.mark.parametrize covering all 3 conflict scenarios, calls build_description directly.
  • Test 3: rewritten with @pytest.mark.parametrize over 4 invalid inputs.

tests/unit_tests/dataset/test_dataset.py

  • test_dataset_gaps: removed stale _find_key_in_nested_dict setup, added _description_precedence, fixed patch target to eegdash.bids_metadata.
  • test_dataset_init_exception_gap: fixed patch target.
  • test_iterate_local_with_participants_exception: import from bids_metadata.
  • test_dataset_find_key_nested + test_dataset_recursive_search: use find_key_in_nested_dict directly.

Comment thread eegdash/dataset/dataset.py Outdated
auth_token: str | None = None,
on_error: str = "raise",
max_concurrency: int = 20,
description_precedence: str = "record",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description_precedence: str = "record",
description_precedence: str = "participant_tsv",

Copy link
Copy Markdown
Collaborator

@bruAristimunha bruAristimunha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small detail, and let's go!

@GalAshkenazi1 GalAshkenazi1 merged commit 3309f14 into develop May 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants